
The objective of this blog is to design a Neural Network Model to predict Bank Customer Churn.
Moreover is to predict if customers will remain with a bank or if they will opt out of banking services in the next 6 months. The nature of this model will be classification.
Data Description:
The case study is from an open-source dataset from Kaggle.
The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore,
Geography, Gender, Age, Tenure, Balance etc.
Link to the Kaggle project site:
https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling
The input dataset contains 14 columns, out of which 13 are used as Independent features, and the last one is the Dependent feature.
Customer churn is a major problem of customers leaving your products/subscription and moving to another service. Due to direct effect on profit margins, businesses now are looking to identify customers who are at the risk of churning and retaining them by personalized promotional offers. In order to retain them, they need to identify the customers as well as the reason of churning so that they can provide the customers with personalized offers and products. The aim of our project is to solve this problem for banking domain, by identifying which customers are at risk of churning and what are the reasons for churning with the help of data mining and machine learning algorithms. The project focuses on 2 deliverables - Predict customers likely to churn using supervised learning classification algorithms and customer segmentation of customers using unsupervised learning to validate the similarities in the ‘likely to churn’ customer subset to come up with different segments. The reasons for a particular customer churn can vary from internal factors as well as external factors but we will try to understand the reasons of churning depending on internal factors using explainable AI, which breaks into the blackbox of machine learning algorithms and gives a clear explanation of the predictions.
• Analyze the underlying distribution of the number of users who are about to leave the subscription and perform customer segmentation on the likely, to churn customers.
• Identified the reasons which help in targeted marketing and generated cluster analysis details with explainable AI.
pip install tensorflow
pip install keras
pip install chart-studio
import numpy as np
import keras
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn import metrics
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, precision_score, f1_score, auc
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
print(tf.__version__)
#importing the dataset, set RowNumber as index
#load the csv file and make the data frame
df = pd.read_csv('Churn_Modelling.csv',index_col='RowNumber')
df.shape # 10,000 rows, 13 columns
df.head(2) #Exited is target column
#Check datatypes
df.info()
Surname, Gender and Gepgraphy are Object type
#Check for missing values
df.isna().sum()
There is no missing values or other types of noise in dataset.
df.describe().round(2)
# Convert data into feature and Target set. Also CustomerId and Surname will not contribute to model building
#hence we wil drop these 2 colmns as well
X=df.drop(labels=['CustomerId','Surname','Exited'], axis=1) # Feature Set
y=df['Exited'] # Target set
#target variable is y=df['Exited']
#look at distribution of exited and non-exited customers
sns.countplot(x="Exited", data=df)
labels = 'Exited', 'Retained'
sizes = [df.Exited[df['Exited']==1].count(), df.Exited[df['Exited']==0].count()]
explode = (0, 0.1)
fig1, ax1 = plt.subplots(figsize=(10, 8))
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Proportion of customer churned and retained", size = 20)
plt.show()
The dependent variable (Exited), the value that we are going to predict, will be the exit of the customer from the bank (binary variable 0 if the customer stays and 1 if the client exit).
The independent variables will be
Data set has has only around 2000 exited customers and about 8000 Customers are still with Bank- it has bias towards existing customers.
Note: So about 20% of the customers have churned. So the baseline model could be to predict that 20% of the customers will churn.
Given 20% is a small number, we need to ensure that the chosen model does predict with great accuracy this 20% as it is of interest to the bank to identify and keep this bunch as opposed to accurately predicting the customers that are retained.
sns.countplot(x="Gender", data=df)
Bank has bout 4500 female customers and 5500 male customers
sns.countplot(x="Geography", data=df)
Most of the Customers are from France, Customers from spain and Genrmany are about half in numbers of France
sns.countplot(x="Exited", hue="Gender", data=df)
Above plot says that female customers have higher propensity to exit the Bank
sns.countplot(x="Exited", hue="Geography", data=df)
df.select_dtypes(exclude='object').hist(figsize=(14,10),bins=20)
sns.countplot(df['Exited'])
df['Exited'].value_counts().unique()
corr = df.corr()
corr.style.background_gradient(cmap='Greens').set_precision(2)
#Rates of credit card’s usage according to gender
fig = px.parallel_categories(df, dimensions=['Gender', 'Geography', 'Exited'],
color="Exited", color_continuous_scale=px.colors.sequential.Inferno,
labels={'Gender':'Gender(Female,Male)', 'Exited':'Exited(0:No,1:Yes)'})
fig.update_layout(title_text="Gender-Geography-Exited-Not Exited Schema")
fig.show();
Around 20% of people exited. Proportionally more Females exited(1139 Feamle/898 Male). Germans proportionally and numerically exited more(448 Female + 366 Male) French females numerically exited more (460) while French Customers stays at the bank more than all others (5014-810=4196)
fig = px.parallel_categories(df, dimensions=['Gender','HasCrCard',"IsActiveMember", 'Exited'],
color="Exited", color_continuous_scale=px.colors.sequential.Inferno,
labels={'HasCrCard':'Has Credit Card', 'Gender':'Gender(Female,Male)', 'Exited':'Exited(0:No,1:Yes)'})
fig.update_layout(title_text="Credit Card-Gender-Exited-Not Exited Schema")
fig.show();
At the above dynamic presntation we can see how the categories effect exiting decision: Females and inactive members are more prone to exit. Credit card users are also a bit more prone to exit than non credit card users.
Customers from Germany have highest propensity to to exit the Bank
#Lets Check Distribution of exited/non-exited Customers as per the age
plt.figure(figsize=(15, 8))
sns.distplot(df['Age'][df['Exited']==0],color='blue',label='non-exited')
sns.distplot(df['Age'][df['Exited']==1],color='red',label='exited')
plt.show()
Age distribution of customers who exited bank is normally distributed while those who stays with bank is right skewed indicating that most of the existing customers of bank are lower than 50 years of age. This may also indicate that old age customers have exited the bank.
now we are going to see the important column and the more powerfull column 'geography'. and i want to visualize this column with plotly, because it interactive visualization library.
France = float(df[df['Geography']=='France']['Geography'].count())
Spain = float(df[df['Geography']=='Spain']['Geography'].count())
Germany = float(df[df['Geography']=='Germany']['Geography'].count())
print(France+Spain+Germany)
import chart_studio.plotly as py
import plotly.graph_objects as go
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
data = dict(type='choropleth',
locations=['ESP','FRA','DEU'],
colorscale='YlGnBu',
text = ['Spain','France','Germany'],
z=[France,Spain,Germany],
colorbar={'title':'number in each geography'})
layout = dict(title='Counting the numbers of each nationality',
geo=dict(showframe=False,projection={'type':'natural earth'}))
choromap = go.Figure(data=[data],layout=layout)
iplot(choromap)
X.info()
# Geography and gender are object type, we will convert this into one hot encoding
X= pd.get_dummies(X)
X.info()
Object columns- Geography and Genders have been converted to one hot encoded columns
#Lets Check first few rows of feature set
X.head()
from sklearn.model_selection import train_test_split
#test train split
test_size = 0.30 # taking 70:30 training and test set
seed = 7 # Random numbmer seeding for reapeatability of the code
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=seed)
#hCheck Shape of test/trainset
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Normalisation Following features that have running/continuous values using standard scaler: CreditScore, Age, tenure, Balance, NumOfProducts, EstimatedSalary.
We will not normalise following features as they have discrete values either 0 or 1: HasCrCard, IsActiveMember, Geography_France, Geography_Germany, Geography_Spain, Gender_Female, Gender_Male
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_train[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']].head(2)
X_train.head(2)
scaler.fit(X_train[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']])
X_train_scaled=scaler.transform(X_train[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']])
# Transform test set on the same fit as train set
X_test_scaled=scaler.transform(X_test[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']])
# Put back scaled data into the dataframe for the columns which have been scaled while keeping other data intact
X_train[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']]=X_train_scaled
X_train.head(2)
X_test[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary']]=X_test_scaled
X_test.head(2)
# Convert Data into Numpy arrays
X_train_array=np.array(X_train)
X_test_array=np.array(X_test)
y_train_array=np.array(y_train)
y_test_array=np.array(y_test)
X_train_array.shape,X_test_array.shape,y_train_array.shape,y_test_array.shape#check shapes of array
# Initialize Sequential model
model = tf.keras.models.Sequential()
# Add Input layer to the model
model.add(tf.keras.Input(shape=(13,))) # 13 Features
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
# Hidden layers
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_1'))
model.add(tf.keras.layers.Dense(10, activation='relu', name='Layer_2'))
#Output layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid', name='Output'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(X_train_array, y_train_array, validation_data=(X_test_array, y_test_array), epochs=150,
batch_size = 32)
model.predict(X_test_array)[:5] # Observe first 5 probabilities
th=0.5 # Threshold
y_test_preds = np.where(model.predict(X_test_array) > th, 1, 0)
y_test_preds[:5] # Observe First 5 predictions
# Confusion matrix with optimal Threshold on test set
metrics.confusion_matrix(y_test, y_test_preds)
keras_predictions = model.predict_classes(X_test_array, batch_size=200, verbose=1)
cm = confusion_matrix(y_test, keras_predictions)
actual_cm = confusion_matrix(y_test, y_test_preds)
labels = ['No Exited', 'Exited']
fig = plt.figure(figsize=(16,8))
fig.add_subplot(221)
plt.title("Confusion Matrix \n keras")
sns.heatmap(cm,annot=True,cmap="Blues",fmt="d",cbar=False)
plt.show()
print('Test Metrics at 0.5 Threshold with basic DNN model\n')
Test_Metrics_Basic_DNN=pd.DataFrame(data=[accuracy_score(y_test, y_test_preds),
recall_score(y_test, y_test_preds),
precision_score(y_test, y_test_preds),
f1_score(y_test, y_test_preds)], columns=['Basic DNN'],
index=["accuracy", "recall", "precision", "f1_score"])
print(Test_Metrics_Basic_DNN)

I want to focus on how to approach some characteristic elements of NNs, whose initialization, optimization and tuning can make the NN much more powerful and accurate mainly.
Parameters: these are the coefficients of the model, and they are chosen by the model itself. It means that the algorithm, while learning, optimizes these coefficients eg weights (according to a given optimization strategy) and returns an array of parameters which minimize the error.The only thing we have to do with those parameters is initializing them eg activation functions
Hence, there are some ideas to properly initialize the parameters depending on the activation function we employ. As we will use a ReLU, we will use the He initialization, with a normal distribution.
Hyperparameters: these are elements that, differently from the previous ones, we need to set. Furthermore, the model will not update them according to the optimization strategy: manual intervention will always be needed.
So for this specific task we will improove our model by:
Number of hidden layers: We need to test our model with more layers in order to see if thw accuracy will be increased.
Activation function: it is the function through which we pass our weighed sum, in order to have a significant output, namely as a vector of probability or a 0–1 output. The major activation functions are Sigmoid like RELU used.
BatchNormalization layer
Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference.
Layer normalization layer.
Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.
Improving the model using Dropout regularization
When your model was trained too much on the training set, that becomes much less performance on the test set. That is called overfitting. Dropout Regularization is the technique used to remove the overfitting. To avoid overfitting at each iteration of the training we add dropout layer after each existing layer in our neural network. Let's see what happen to our neural network with dropout layers.

# Initialize Sequential model
model = tf.keras.models.Sequential()
# Add Input layer to the model
model.add(tf.keras.Input(shape=(13,))) # 13 Features
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
# Hidden layers
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_1'))
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_2'))
model.add(tf.keras.layers.Dense(10, activation='relu', name='Layer_3'))
#Output layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid', name='Output'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train_array, y_train_array, validation_data=(X_test_array, y_test_array), epochs=150,
batch_size = 32)
th=0.5 # Threshold
y_test_preds = np.where(model.predict(X_test_array) > th, 1, 0)
print('Test Metrics at 0.5 Threshold with 3 Hidden layer DNN model\n')
Test_Metrics_3_HiddenLayer_DNN=pd.DataFrame(data=[accuracy_score(y_test, y_test_preds),
recall_score(y_test_array, y_test_preds),
precision_score(y_test_array, y_test_preds),
f1_score(y_test_array, y_test_preds)], columns=['3 Hidden Layer DNN'],
index=["accuracy", "recall", "precision", "f1_score"])
print(Test_Metrics_3_HiddenLayer_DNN)
# Confusion matrix with optimal Threshold on test set
metrics.confusion_matrix(y_test_array, y_test_preds)
# Initialize Sequential model
model = tf.keras.models.Sequential()
# Add Input layer to the model
model.add(tf.keras.Input(shape=(13,))) # 13 Features
# Batch Normalization Layer
#model.add(tf.keras.layers.BatchNormalization())
# Hidden layers
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_1'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_2'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10, activation='relu', name='Layer_3'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
#Output layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid', name='Output'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(X_train_array, y_train_array, validation_data=(X_test_array, y_test_array), epochs=150,
batch_size = 32)
th=0.5 # Threshold
y_test_preds = np.where(model.predict(X_test_array) > th, 1, 0)
print('Test Metrics at 0.5 Threshold with Batch Norm after each hidden layer DNN model\n')
Test_Metrics_BatchNorm=pd.DataFrame(data=[accuracy_score(y_test, y_test_preds),
recall_score(y_test_array, y_test_preds),
precision_score(y_test_array, y_test_preds),
f1_score(y_test_array, y_test_preds)], columns=['BatchNorm Hidden layers'],
index=["accuracy", "recall", "precision", "f1_score"])
print(Test_Metrics_BatchNorm)
# Confusion matrix with optimal Threshold on test set
metrics.confusion_matrix(y_test_array, y_test_preds)
from keras import initializers
# Initialize Sequential model
model = tf.keras.models.Sequential()
# Add Input layer to the model
model.add(tf.keras.Input(shape=(13,))) # 13 Features
# Hidden layers
model.add(tf.keras.layers.Dense(13, kernel_initializer='he_normal', bias_initializer='Ones',activation='relu', name='Layer_1'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(13, kernel_initializer='he_normal',bias_initializer='Ones',activation='relu', name='Layer_2'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10,kernel_initializer='he_normal',bias_initializer='Ones', activation='relu', name='Layer_3'))
# Batch Normalization Layer
model.add(tf.keras.layers.BatchNormalization())
#Output layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid', name='Output'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train_array, y_train_array, validation_data=(X_test_array, y_test_array), epochs=50,
batch_size = 32)
th=0.5 # Threshold
y_test_preds = np.where(model.predict(X_test_array) > th, 1, 0)
print('Test Metrics at 0.5 Threshold withv Weight and Bias initialization & Batch Norm after each hidden layer DNN model\n')
Test_Metrics_Weight_Init=pd.DataFrame(data=[accuracy_score(y_test, y_test_preds),
recall_score(y_test_array, y_test_preds),
precision_score(y_test_array, y_test_preds),
f1_score(y_test_array, y_test_preds)], columns=['Weight Initialize'],
index=["accuracy", "recall", "precision", "f1_score"])
print(Test_Metrics_Weight_Init)
# Confusion matrix with optimal Threshold on test set
metrics.confusion_matrix(y_test_array, y_test_preds)

# Initialize Sequential model
model = tf.keras.models.Sequential()
# Add Input layer to the model
model.add(tf.keras.Input(shape=(13,))) # 13 Features
# Hidden layers
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_1'))
model.add(tf.keras.layers.Dense(13, activation='relu', name='Layer_2'))
# Dropout layer
model.add(tf.keras.layers.Dropout(0.5))
# Hidden layers
model.add(tf.keras.layers.Dense(10, activation='relu', name='Layer_3'))
# Dropout layer
model.add(tf.keras.layers.Dropout(0.3))
#Output layer
model.add(tf.keras.layers.Dense(1, activation='sigmoid', name='Output'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(X_train_array, y_train_array, validation_data=(X_test_array, y_test_array), epochs=100,
batch_size = 32, verbose=0)
th=0.5 # Threshold
y_test_preds = np.where(model.predict(X_test_array) > th, 1, 0)
print('Test Metrics at 0.5 Threshold Dropout DNN model\n')
Test_Metrics_DropOut=pd.DataFrame(data=[accuracy_score(y_test, y_test_preds),
recall_score(y_test_array, y_test_preds),
precision_score(y_test_array, y_test_preds),
f1_score(y_test_array, y_test_preds)], columns=['DropOut'],
index=["accuracy", "recall", "precision", "f1_score"])
print(Test_Metrics_DropOut)
# Confusion matrix with optimal Threshold on test set
metrics.confusion_matrix(y_test_array, y_test_preds)
Model_Comparison_df=Test_Metrics_Basic_DNN
Model_Comparison_df['3 Hidden Layer DNN']=Test_Metrics_3_HiddenLayer_DNN['3 Hidden Layer DNN']
Model_Comparison_df['BatchNorm Hidden layers']=Test_Metrics_BatchNorm['BatchNorm Hidden layers']
Model_Comparison_df['Weight Initialize']=Test_Metrics_Weight_Init['Weight Initialize']
Model_Comparison_df['DropOut']=Test_Metrics_DropOut['DropOut']
Model_Comparison_df